{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sphx_glr_tutorial_autotvm_relay_x86)=\n",
    "# 用 Python 接口编译和优化模型（AutoTVM）\n",
    "\n",
    "**原作者**: [Chris Hoge](https://github.com/hogepodge>)\n",
    "\n",
    "在 [TVMC 教程](tvmc_command_line_driver) 中，介绍了如何使用 TVM 的命令行界面 TVMC 来编译、运行和微调预训练的视觉模型 ResNet-50 v2。不过，TVM 不仅仅是命令行工具，它也是优化框架，其 API 可用于许多不同的语言，在处理机器学习模型方面给你带来巨大的灵活性。\n",
    "\n",
    "在本教程中，将涵盖与 TVMC 相同的内容，但展示如何用 Python API 来完成它。完成本节后，将使用 TVM 的 Python API 来完成以下任务：\n",
    "\n",
    "- 编译预训练的 ResNet-50 v2 模型供 TVM 运行时使用。\n",
    "- 使用编译后的模型，运行真实图像，并解释输出和评估模型性能。\n",
    "- 使用 TVM 在 CPU 上调度该模型。\n",
    "- 使用 TVM 收集的调度数据重新编译已优化的模型。\n",
    "- 通过优化后的模型运行图像，并比较输出和模型的性能。\n",
    "\n",
    "本节的目的是让你了解 TVM 的能力以及如何通过 Python API 使用它们。\n",
    "\n",
    "TVM 是一个深度学习编译器框架，有许多不同的模块可用于处理深度学习模型和算子。在本教程中，我们将研究如何使用 Python API 加载、编译和优化一个模型。\n",
    "\n",
    "首先要导入一些依赖关系，包括用于加载和转换模型的 ``mxnet``，用于下载测试数据的辅助工具，用于处理图像数据的 Python 图像库，用于图像数据预处理和后处理的 ``numpy``，TVM Relay 框架，以及 TVM Graph Executor。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "import numpy as np\n",
    "from tvm.contrib.download import download_testdata\n",
    "import tvm\n",
    "from tvm import relay\n",
    "from tvm.contrib import graph_executor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 下载和加载前端模型\n",
    "\n",
    "在本教程中，使用 ResNet-50 v2。ResNet-50 是卷积神经网络，有 50 层深度，旨在对图像进行分类。该模型已经在超过一百万张图片上进行了预训练，有 1000 种不同的分类。该网络的输入图像大小为 224x224。\n",
    "\n",
    "```{note}\n",
    "如果你有兴趣探索更多关于 ResNet-50 模型的结构，建议下载免费的 ML 模型查看器 [Netron](https://netron.app)。\n",
    "```\n",
    "\n",
    "TVM 提供了辅助库来下载预训练的模型。通过该模块提供模型的 URL、文件名和模型类型，TVM 将下载模型并保存到磁盘。\n",
    "\n",
    "```{admonition} 与其他模型格式一起工作\n",
    "TVM 支持许多流行的模型格式。清单可以在 TVM 文档的 [编译深度学习模型](tutorial-frontend) 部分找到。\n",
    "```\n",
    "\n",
    "````{note}\n",
    "可以直接使用如下方式下载预训练的模型（以 ONNX 为例）：\n",
    "\n",
    "```python\n",
    "model_url = \"\".join(\n",
    "    [\n",
    "        \"https://github.com/onnx/models/raw/\",\n",
    "        \"master/vision/classification/resnet/model/\",\n",
    "        \"resnet50-v2-7.onnx\",\n",
    "    ]\n",
    ")\n",
    "\n",
    "model_path = download_testdata(model_url, \"resnet50-v2-7.onnx\", module=\"onnx\")\n",
    "```\n",
    "````\n",
    "\n",
    "MXNet 可直接载入模型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mxnet.gluon.model_zoo import vision\n",
    "\n",
    "model_name = 'resnet50_v2'\n",
    "gluon_model = vision.get_model(model_name, pretrained=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 下载、预处理和加载测试图像\n",
    "\n",
    "当涉及到预期的张量形状、格式和数据类型时，每个模型都很特别。出于这个原因，大多数模型需要一些预处理和后处理，以确保输入是有效的，并解释输出。TVMC 对输入和输出数据都采用了 NumPy 的 ``.npz`` 格式。\n",
    "\n",
    "作为本教程的输入，将使用一只猫的图像，但你可以自由地用你选择的任何图像来代替这个图像。\n",
    "\n",
    "<img src=\"https://s3.amazonaws.com/model-server/inputs/kitten.jpg\" height=\"224px\" width=\"224px\" align=\"center\">\n",
    "\n",
    "下载图像数据，然后将其转换成 numpy 数组，作为模型的输入。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_url = \"https://s3.amazonaws.com/model-server/inputs/kitten.jpg\"\n",
    "img_path = download_testdata(img_url, \"imagenet_cat.png\", module=\"data\")\n",
    "\n",
    "# resize 到 224x224\n",
    "with Image.open(img_path) as im:\n",
    "    resized_image = im.resize((224, 224))\n",
    "# 转换为 float32\n",
    "img_data = np.asarray(resized_image).astype(\"float32\")\n",
    "# 输入图像是在 HWC 布局，而 MXNet 期望 CHW 输入\n",
    "img_data = np.transpose(img_data, (2, 0, 1))\n",
    "# 根据 ImageNet 输入规范进行 Normalize\n",
    "imagenet_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))\n",
    "imagenet_stddev = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))\n",
    "norm_img_data = (img_data / 255 - imagenet_mean) / imagenet_stddev\n",
    "# 添加批处理维度，设置数据为 4 维 输入：NCHW\n",
    "img_data = np.expand_dims(norm_img_data, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 用 Relay 编译模型\n",
    "\n",
    "下一步是编译 ResNet 模型。使用 {func}`~tvm.relay.frontend.from_mxnet` 导入器将模型导入到 {mod}`~tvm.relay`。\n",
    "\n",
    "不同的模型类型，输入的名称可能不同。你可以使用 Netron 这样的工具来检查输入名称。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_name = \"data\"\n",
    "shape_dict = {input_name: img_data.shape}\n",
    "\n",
    "mod, params = relay.frontend.from_mxnet(gluon_model, shape_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将模型与标准优化一起构建成 TVM 库。\n",
    "\n",
    "```{admonition} 定义正确的目标\n",
    "指定正确的目标可以对编译后的模块的性能产生巨大影响，因为它可以利用目标上可用的硬件特性。欲了解更多信息，请参考为 [x86 CPU 自动调整卷积网络](tune_relay_x86)。建议确定你运行的是哪种 CPU，以及可选的功能，并适当地设置目标。例如，对于某些处理器， `target = \"llvm -mcpu=skylake\"`，或者对于具有 AVX-512 向量指令集的处理器， `target = \"llvm-mcpu=skylake-avx512\"`。\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.\n"
     ]
    }
   ],
   "source": [
    "target = \"llvm\"\n",
    "with tvm.transform.PassContext(opt_level=3):\n",
    "    lib = relay.build(mod, target=target, params=params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从该库中创建 TVM graph 运行时模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "dev = tvm.device(str(target), 0)\n",
    "module = graph_executor.GraphModule(lib[\"default\"](dev))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 在 TVM 运行时上执行\n",
    "\n",
    "已经编译了模型，下面可以使用 TVM 运行时来进行预测。要使用 TVM 来运行模型并进行预测，需要两样东西：\n",
    "\n",
    "- 编译后的模型，也就是我们刚刚制作的模块 `module`。\n",
    "- 对模型的有效输入，以便进行预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "module.set_input(input_name, img_data)\n",
    "module.run()\n",
    "output_shape = (1, 1000)\n",
    "tvm_output = module.get_output(0).numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 收集基本性能数据\n",
    "\n",
    "想收集一些与这个未优化的模型相关的基本性能数据，并在以后与调整后的模型进行比较。为了帮助说明 CPU 的噪音，在多个批次的重复中运行计算，然后收集一些关于平均值、中位数和标准差的基础统计数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'mean': 31.967710580211133, 'median': 31.849463004618883, 'std': 0.2711562455644498}\n"
     ]
    }
   ],
   "source": [
    "import timeit\n",
    "\n",
    "timing_number = 10\n",
    "timing_repeat = 10\n",
    "unoptimized = (\n",
    "    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))\n",
    "    * 1000\n",
    "    / timing_number\n",
    ")\n",
    "unoptimized = {\n",
    "    \"mean\": np.mean(unoptimized),\n",
    "    \"median\": np.median(unoptimized),\n",
    "    \"std\": np.std(unoptimized),\n",
    "}\n",
    "\n",
    "print(unoptimized)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 对输出进行后处理\n",
    "\n",
    "如前所述，每个模型都有自己提供输出张量的特殊方式。\n",
    "\n",
    "在案例中，需要运行一些后处理，利用为模型提供的查找表，将 ResNet-50 v2 的输出渲染成更适合人类阅读的形式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "class='tiger cat' with probability=0.526644\n",
      "class='tabby, tabby cat' with probability=0.403282\n",
      "class='Egyptian cat' with probability=0.036493\n",
      "class='tiger, Panthera tigris' with probability=0.004262\n",
      "class='plastic bag' with probability=0.002360\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/media/pc/data/tmp/cache/conda/envs/xi/lib/python3.10/site-packages/gluoncv/__init__.py:40: UserWarning: Both `mxnet==1.9.1` and `torch==2.0.0` are installed. You might encounter increased GPU memory footprint if both framework are used at the same time.\n",
      "  warnings.warn(f'Both `mxnet=={mx.__version__}` and `torch=={torch.__version__}` are installed. '\n"
     ]
    }
   ],
   "source": [
    "from scipy.special import softmax\n",
    "from gluoncv.data.imagenet.classification import ImageNet1kAttr\n",
    "\n",
    "# 获取 ImageNet 标签列表\n",
    "imagenet_1k_attr = ImageNet1kAttr()\n",
    "labels = imagenet_1k_attr.classes_long\n",
    "# 获取输出张量\n",
    "scores = softmax(tvm_output)\n",
    "scores = np.squeeze(scores)\n",
    "ranks = np.argsort(scores)[::-1]\n",
    "for rank in ranks[0:5]:\n",
    "    print(f\"class='{labels[rank]}' with probability={scores[rank]:f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 调优模型\n",
    "\n",
    "之前的模型是为了在 TVM 运行时工作而编译的，但不包括任何特定平台的优化。在本节中，将向你展示如何使用 TVM 建立针对你工作平台的优化模型。\n",
    "\n",
    "在某些情况下，当使用编译的模块运行推理时，可能无法获得预期的性能。在这种情况下，可以利用自动调谐器，为模型找到更好的配置，获得性能的提升。TVM 中的调谐是指对模型进行优化以在给定目标上更快地运行的过程。这与训练或微调不同，因为它不影响模型的准确性，而只影响运行时的性能。作为调优过程的一部分，TVM 将尝试运行许多不同的算子实现变体，以观察哪些算子表现最佳。这些运行的结果被储存在调优记录文件中。\n",
    "\n",
    "在最简单的形式下，调优需要你提供三样东西：\n",
    "\n",
    "- 你打算在上面运行这个模型的设备的目标规格\n",
    "- 输出文件的路径，调优记录将被存储在该文件中\n",
    "- 要调优的模型的路径\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tvm.autotvm.tuner import XGBTuner\n",
    "from tvm import autotvm"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "{class}`~tvm.autotvm.measure.measure_methods.LocalRunner` 在本地设备上运行生成的代码。\n",
    "\n",
    "{class}`~tvm.autotvm.measure.measure_methods.LocalRunner` 使用一组特定参数生成的编译代码，并度量它的性能。\n",
    "\n",
    "- ``timeout``: 为每个测试的配置运行训练代码的时间设置了上限。\n",
    "- ``number``: 运行生成的代码求平均值的次数。称这些运行为一次 `repeat` 测量。\n",
    "- ``repeat``(可选): 重复测量的次数。总的来说，生成的代码将运行 $1 + \\text{number} \\times \\text{repeat}$ 次，其中第一次是热身并将被丢弃。返回的结果包含 `repeat` 成本，每个成本都是 ``number`` 成本的平均值。\n",
    "- ``min_repeat_ms``(可选): 一次 `repeat` 的最小持续时间（以毫秒为单位）。默认情况下，一次 `repeat` 包含 `number` 次运行。如果设置了该参数，参数 `number` 将动态调整，以满足一次 `repeat` 的最小持续时间要求。即，当一次 `repeat` 的运行时间低于此时间时，`number` 参数将自动增加。\n",
    "- ``cooldown_interval``(可选): 两次测量之间的冷却间隔。\n",
    "- ``enable_cpu_cache_flush``: 是否在重复测量之间刷新 CPU 缓存。在端到端推断过程中，刷新缓存可以使一个算子的测量延迟（latency）更接近其实际延迟。为了使这个选项有效，参数 `number` 也应该设置为 `1`。这只对 CPU 任务有效。\n",
    "\n",
    "```{tip}\n",
    "这是“伪”本地模式。为用户启动 silent rpc tracker 和 rpc server。通过这种方式，可以在 RPC 基础结构中重用 timeout/isolation 机制。\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "number = 10\n",
    "repeat = 1\n",
    "min_repeat_ms = 0  # 在 CPU 上调优，可以设置为 0\n",
    "timeout = 10  # 单位：秒\n",
    "\n",
    "# 创建 TVM runner\n",
    "runner = autotvm.LocalRunner(\n",
    "    number=number,\n",
    "    repeat=repeat,\n",
    "    timeout=timeout,\n",
    "    min_repeat_ms=min_repeat_ms,\n",
    "    enable_cpu_cache_flush=True,\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "创建简单的结构来保存调谐选项。使用 XGBoost 算法来指导搜索。\n",
    "\n",
    "对于生产作业来说，你会想把试验的数量设置得比这里使用的 20 的值大。对于 CPU，推荐 1500，对于 GPU，推荐 3000-4000。所需的试验次数可能取决于特定的模型和处理器，因此值得花一些时间来评估各种数值的性能，以找到调整时间和模型优化之间的最佳平衡。因为运行调谐是需要时间的，我们将试验次数设置为 20 次，但不建议使用这么小的值。\n",
    "\n",
    "- ``early_stopping`` 参数是在应用提前停止搜索的条件之前，要运行的最小 `trails`。\n",
    "- ``measure`` 选项表示将在哪里建立 trial 代码，以及将在哪里运行。在这种情况下，使用刚刚创建的 ``LocalRunner`` 和 ``LocalBuilder``。\n",
    "- ``tuning_records`` 选项指定了文件来写入调优数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "tuning_option = {\n",
    "    \"tuner\": \"xgb\",\n",
    "    \"trials\": 20, # 对于 CPU，推荐 1500，对于 GPU，推荐 3000-4000。\n",
    "    \"early_stopping\": 100,\n",
    "    \"measure_option\": autotvm.measure_option(\n",
    "        builder=autotvm.LocalBuilder(build_func=\"default\"), runner=runner\n",
    "    ),\n",
    "    \"tuning_records\": \"build/resnet-50-v2-autotuning.json\",\n",
    "}"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} 定义调谐搜索算法\n",
    "默认情况下，这种搜索是使用 XGBoost 网格算法指导的。根据你的模型的复杂性和可用的时间量，你可能想选择一个不同的算法。\n",
    "```\n",
    "\n",
    "```{admonition} 设置调谐参数\n",
    "在这个例子中，为了节省时间，将试验次数和提前停止设置为 20 和 100。如果你把这些值设置得更高，你可能会看到更多的性能改进，但这是以花时间调整为代价的。收敛所需的试验次数将取决于模型和目标平台的具体情况。\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Task  1/25]  Current/Best:  260.45/ 260.45 GFLOPS | Progress: (20/20) | 14.52 s Done.\n",
      "[Task  2/25]  Current/Best:   87.00/ 179.41 GFLOPS | Progress: (20/20) | 9.63 s Done.\n",
      "[Task  3/25]  Current/Best:  199.70/ 211.49 GFLOPS | Progress: (20/20) | 10.89 s Done.\n",
      "[Task  4/25]  Current/Best:  100.88/ 168.99 GFLOPS | Progress: (20/20) | 15.53 s Done.\n",
      "[Task  5/25]  Current/Best:   87.72/ 237.24 GFLOPS | Progress: (20/20) | 9.88 s Done.\n",
      "[Task  6/25]  Current/Best:  104.50/ 324.75 GFLOPS | Progress: (20/20) | 12.50 s Done.\n",
      "[Task  7/25]  Current/Best:  101.97/ 189.49 GFLOPS | Progress: (20/20) | 9.84 s Done.\n",
      "[Task  8/25]  Current/Best:  136.84/ 208.09 GFLOPS | Progress: (20/20) | 15.52 s Done.\n",
      "[Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s s Done.\n",
      "[Task 10/25]  Current/Best:   90.49/ 139.27 GFLOPS | Progress: (20/20) | 10.54 s Done.\n",
      "[Task 11/25]  Current/Best:  101.51/ 196.03 GFLOPS | Progress: (20/20) | 9.68 s Done.\n",
      "[Task 12/25]  Current/Best:  122.38/ 189.81 GFLOPS | Progress: (20/20) | 14.59 s Done.\n",
      "[Task 13/25]  Current/Best:  113.98/ 220.62 GFLOPS | Progress: (20/20) | 10.72 s Done.\n",
      "[Task 14/25]  Current/Best:   70.46/ 215.15 GFLOPS | Progress: (20/20) | 18.78 s Done.\n",
      "[Task 15/25]  Current/Best:   74.15/ 136.20 GFLOPS | Progress: (20/20) | 18.03 s Done.\n",
      "[Task 16/25]  Current/Best:  116.13/ 148.96 GFLOPS | Progress: (20/20) | 9.49 s Done.\n",
      "[Task 17/25]  Current/Best:  106.93/ 213.26 GFLOPS | Progress: (20/20) | 10.92 s Done.\n",
      "[Task 18/25]  Current/Best:  192.79/ 192.79 GFLOPS | Progress: (20/20) | 10.92 s Done.\n",
      "[Task 19/25]  Current/Best:  150.13/ 336.74 GFLOPS | Progress: (20/20) | 12.62 s Done.\n",
      "[Task 20/25]  Current/Best:  141.06/ 265.05 GFLOPS | Progress: (20/20) | 13.02 s Done.\n",
      "[Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s s Done.\n",
      "[Task 22/25]  Current/Best:   69.82/ 127.29 GFLOPS | Progress: (20/20) | 11.28 s Done.\n",
      "[Task 23/25]  Current/Best:  114.11/ 331.16 GFLOPS | Progress: (20/20) | 10.99 s Done.\n",
      "[Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s s Done.\n",
      "[Task 25/25]  Current/Best:    6.96/  47.50 GFLOPS | Progress: (20/20) | 14.70 s Done.\n"
     ]
    }
   ],
   "source": [
    "# 首先从模型中提取任务\n",
    "tasks = autotvm.task.extract_from_program(mod[\"main\"], target=target, params=params)\n",
    "\n",
    "# 按顺序调优提取的任务\n",
    "for i, task in enumerate(tasks):\n",
    "    prefix = f\"[Task {i + 1:2d}/{len(tasks):2d}] \"\n",
    "    tuner_obj = XGBTuner(task, loss_type=\"rank\")\n",
    "    tuner_obj.tune(\n",
    "        n_trial=min(tuning_option[\"trials\"], len(task.config_space)),\n",
    "        early_stopping=tuning_option[\"early_stopping\"],\n",
    "        measure_option=tuning_option[\"measure_option\"],\n",
    "        callbacks=[\n",
    "            autotvm.callback.progress_bar(tuning_option[\"trials\"], prefix=prefix),\n",
    "            autotvm.callback.log_to_file(tuning_option[\"tuning_records\"]),\n",
    "        ],\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 用调优数据编译优化后的模型\n",
    "\n",
    "作为上述调优过程的输出，我们获得了存储在 ``resnet-50-v2-autotuning.json`` 的调优记录。编译器将使用这些结果，在你指定的目标上为模型生成高性能代码。\n",
    "\n",
    "现在，模型的调优数据已经收集完毕，可以使用优化的算子重新编译模型，以加快计算速度。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "with autotvm.apply_history_best(tuning_option[\"tuning_records\"]):\n",
    "    with tvm.transform.PassContext(opt_level=3, config={}):\n",
    "        lib = relay.build(mod, target=target, params=params)\n",
    "\n",
    "dev = tvm.device(str(target), 0)\n",
    "module = graph_executor.GraphModule(lib[\"default\"](dev))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "验证优化后的模型是否运行并产生相同的结果：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "class='tiger cat' with probability=0.526640\n",
      "class='tabby, tabby cat' with probability=0.403286\n",
      "class='Egyptian cat' with probability=0.036492\n",
      "class='tiger, Panthera tigris' with probability=0.004262\n",
      "class='plastic bag' with probability=0.002361\n"
     ]
    }
   ],
   "source": [
    "dtype = \"float32\"\n",
    "module.set_input(input_name, img_data)\n",
    "module.run()\n",
    "output_shape = (1, 1000)\n",
    "tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()\n",
    "\n",
    "scores = softmax(tvm_output)\n",
    "scores = np.squeeze(scores)\n",
    "ranks = np.argsort(scores)[::-1]\n",
    "for rank in ranks[0:5]:\n",
    "    print(\"class='%s' with probability=%f\" % (labels[rank], scores[rank]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 比较已调谐和未调谐的模型\n",
    "\n",
    "我们想收集一些与这个优化模型相关的基本性能数据，将其与未优化的模型进行比较。根据你的底层硬件、迭代次数和其他因素，你应该看到优化后的模型与未优化的模型相比有性能的提高。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimized: {'mean': 28.78813438117504, 'median': 28.749230003450066, 'std': 0.14753801472029784}\n",
      "unoptimized: {'mean': 31.967710580211133, 'median': 31.849463004618883, 'std': 0.2711562455644498}\n"
     ]
    }
   ],
   "source": [
    "import timeit\n",
    "\n",
    "timing_number = 10\n",
    "timing_repeat = 10\n",
    "optimized = (\n",
    "    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))\n",
    "    * 1000\n",
    "    / timing_number\n",
    ")\n",
    "optimized = {\"mean\": np.mean(optimized), \"median\": np.median(optimized), \"std\": np.std(optimized)}\n",
    "\n",
    "\n",
    "print(\"optimized: %s\" % (optimized))\n",
    "print(\"unoptimized: %s\" % (unoptimized))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "在本教程中，我们举了一个简短的例子，说明如何使用 TVM Python API 来编译、运行和调整一个模型。我们还讨论了对输入和输出进行预处理和后处理的必要性。在调优过程之后，我们演示了如何比较未优化和优化后的模型的性能。\n",
    "\n",
    "这里我们介绍了使用 ResNet-50 v2 本地的简单例子。然而，TVM 支持更多的功能，包括交叉编译、远程执行和剖析/基准测试。\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('py38': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "vscode": {
   "interpreter": {
    "hash": "28558e8daad512806f5c536a1a04c119185f99f65b79002708a12162d02a79c7"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
